Towards a Self-Extending Parser
نویسنده
چکیده
This paper discusses an approach to incremental learning in natural language processing. The techn ique of project ing and integrating semantic const ra in ts to learn word definitions is analyzed as Implemented in the POLITICS system. Extens ions and improvements of this technique are deve loped. The problem of generalizing ex i s t i ng word meanings and understanding metaphor ical uses of words Is addressed In terms of semant ic constraint Integration. 1 . I n t r o d u c t i o n Natural language analysis, like most other subfields of Ar t i f ic ia l Inte l l igence and Computational Linguistics, suffers from the fac t that computer systems are unable to au tomat ica l l y be t te r themselves. Automated learning ia cons ide red a ve r y dif f icult problem, especially when applied to natural language understanding. Consequently, l i tt le ef for t ha8 been focused on this problem. Some pioneering work in Art i f ic ia l intel l igence, such as AM [ I ] and Winston's learning sys tem 1"2] s t rove to learn or discover concept descriptions in we l l -de f ined domains. Although their efforts produced in te res t ing Ideas and techniques, these techniques do not fu l ly e x t e n d to • domain as complex as natural language analys is . Rather than at tempt ing the formidable task of creating a language learning system, I will discuss techniques for Incrementa l ly Increasing the abilit ies of a f lexible language ana lyzer . There are many tasks that can be considered " Incrementa l language learning". Initially the learning domain Is res t r i c t ed to learning the meaning of new words and genera l iz ing ex is t ing word definitions. There ere a number of A.I. techniques, and combinations of these techniques capab le of exhib i t ing incremental learning behavior. I f i rst d iscuss FOULUP and POLITICS, two programs that exhibi t a l imited capab i l i t y for Incremental word learning. Secondly, the techn ique of semantic constraint projection end Integration, as Implemented in POLITICS, Is analyzed in some detail. Finally, I d iscuss the application of some general learning techn iques to the problem of generalizing word definitions end understanding metaphors. 2 . L e a r n i n g From Scr ip t Expec ta t i ons Learning word definit ions In semantically-rich contex ts Is perhaps one of the simpler tasks of incremental learning. Ini t ia l ly I conf ine my discussion to situations where the meaning of a word can be learned from the Immediately surrounding con tex t . Later I relax this criterion to see how global c o n t e x t and multiple examples can help to learn the meaning of unknown words. The FOULUP program [ 3 ] learned the meaning of some unknown words in the con tex t of applying s script to unders tand a story. Scripts [4, 5] are frame-like knowledge rep resen ta t i ons abstract ing the important features and causa l s t ruc ture of mundane events. Scripts have general e x p e c t a t i o n s of the actions and objects that will be encoun te red in processing a story. For Instance, the res tau ran t scr ip t e x p e c t s to see menus, waitresses, and cus tomers ordering and eating food (at d i f ferent p r e s p e c i f l e d times In the story). FOULUP took advantage of these script expectat ions to conc lude tha t Items referenced in the story, which were part of e x p e c t e d actions, were Indeed names of objects that the sc r ip t e x p e c t e d to see. These expectat ions were used to form def in i t ions of new words. For instance, FOULUP induced the meaning of "Rabbi t " in, "A Rabbit veered off the road and s t ruck a t ree, " to be a self-propel led vehicle. The sys tem used information about the automobile accident scr ipt to match the unknown word with the script-role "VEHICLE", b e c a u s e the scr ipt knows that the only objects that veer of f roads to smash Into road-side obstructions ere self propelled veh ic les . 3 . C o n s t r a i n t P r o j e c t i o n In POLITICS The POLITICS system E6, 7] induces the meanings of unknown words by a one*pass syntact ic and semantic cons t ra in t pro ject ion fol lowed by conceptual enrichment from planning and wor ld-knowledge inferences. Consider how POLITICS proceeds when It encounters the unknown word "MPLA" In analyzing the sentence: "Russia sent massive arms shipments to the MPLA In Angola." Since "MPLA" fol lows the article '*the N it must be a noun, a d j e c t i v e or adverb. After the word "MPLA", the preposition " in " Is encountered, thus terminating the current prepos i t iona l phrase begun with " to". Hence, since all we l l fo rmed preposit ional phrases require a head noun, and the " t o " phrase has no other noun, "MPLA" must be the head noun. Thus, by project ing the syntact ic constraints n e c e s s a r y for the sentence to be well formed, one learn8 the syn tac t i c ca tegory of an unknown word. i t Is not always possible to narrow the categorization of a word to a single s y n t a c t i c ca tego ry from one example. In such cases, I p ropose Intersect ing the sets of possible syntact ic ca tego r i es from more then one sample use of the unknown word unti l the Intersect ion has a single element. POLITICS learns the meaning of the unknown word by a similar, but substant ia l ly more complex, application of the same pr inciple of project ing constraints from other parts of the s e n t e n c e and subsequent ly Integrating these constraints t o oonet ruo t a meaning representation. In the example above , POLITICS analyzes the verb " to send" as either i n ATRANS or s PTRAflS. (Schank [ 8 ] discusses the Conceptual Dependency case frames. Briefly, a PTRANS IS s physical t r ans fe r of location, and an ATRANS Is an abstract transfer o f ownership, possession or control.) The reason why POLITICS cannot decide on the type of TRANSfer is that it does not know whether the destination of the transfer (i.e., t h e MPLA) Is s location or an agent. Physical objects, such as weapons, are PTRANSed to locations but ATRANSed to agen ts . The conceptual analysis of the sentence, with MPLA as y e t unresolved, Is diagrammed below: *SUSSIA* <-~ • [ C I P S l < i s > LOC v i i ~qNGOLAe t l mlq.R) RTRRNS • d IN, iq[CIPill
منابع مشابه
Feature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملTowards a Speech - to - Speech Translation System ( 1 )
We describe our Universal Parser Architecture and its use in a speech translation system, based on the Machine Translation system under development at the Center for Machine Translation at Carnegie Mellon. To "understand" natural language, a system must have syntactic knowledge of the language and semantic knowledge of the domain. The Universal Parser Architecture allows grammar writers to deve...
متن کاملExtending Netsniff
This technical report describes, how to extend netsniff with additional stream and packet level parser. It also describes how to extend the log file parser and database, that were built to do statistics on the data collected by Netsniff. Keywords– Netsniff, Extension, Stream-parser, Packet-parser
متن کاملExtending WordNet using Generalized Automated Relationship Induction
This paper describes a Java package for automatically extending WordNet and other semantic lexicons. Extending these semantic lexicons by traditional means of hand labeling word relationships is a very expensive and laborious process. We used machine learning techniques to automatically extract relationships between words from a given text corpus. The package is made to be very flexible, allowi...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملTowards a Parser for Mathematical Formula Recognition
For the transfer of mathematical knowledge from paper to electronic form, the reliable automatic analysis and understanding of mathematical texts is crucial. A robust system for this task needs to combine low level character recognition with higher level structural analysis of mathematical formulas. We present progress towards this goal by extending a database-driven optical character recogniti...
متن کامل